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ABSTRACT 


f 


Least-squares  estimates  of  regression  coefficients 
are  extremely  sensitive  to  large  errors  in  even  a 
single  data  point.  Frequently,  an  ad-hoc  procedure  is 
used  to  weight  the  data  in  a manner  to  alleviate  the 
effects  of  extreme  observations. 

This  thesis  is  a study  of  the  effectives  s of  an 
iterative  regression  method  using  weights  derived 
through  maximum-likelihood  arguments.  Actual  weights 
are  calculated  on  the  assumption  of  Cauchy-distributed 
error  as  a worst-case  situation  in  which  the  errors 
have  long,  fat  tails  and  no  finite  moments. 
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I.  INTRODUCTION 


A.  LEAST-SQUARES  LINEAR  REGRESSION 


It  is  often  desirable  to  model  the  behavior  of  a 
response  variable  as  a function  of  another  variable, 
sonot. imes  referred  to  as  a "carrier",  since  it  carries 
mation  about  the  dependent  variable.  In  the  simplest 
the  equation 


y = b + b x 
i o i i 


is  fitted  to  a set  of  data  points  (x  , y ) . Usually  this  is 

i i 

done  using  the  "least-squares"  procedure  which  selects  the 


coefficients  b and  b that  minimize  the  sum  of  squared 
o i 

residuals,  r , defined  as 
i 


r=y-b-bx 
i i ° * i 


The  procedure  is  based  on  the  linear  model 

y = b ♦ b x ♦ € 
i o i i i 

where  the  e are  independent  and  identically  distributed 
random  variables  with  mean  zero  and  constant  (but  unknown) 
variance.  Then,  by  the  Gauss-Markov  Theorem,  the  estimates 
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► b and  £ found  by  solving  the  "normal  equations" 

0 1 

? y =nb  ♦ b^x 

i o i i 

1 x y = b I x + b 5 x2 
i i o i i i 

are  the  best  (minimum  variance)  linear  unbiased  estimates  of 

b and  b . 
o i 

B.  DEFICIENCIES  OF  LEAST-SQUARES 

the  least-squares  procedure  works  very  well  when  the  e 

i 

are  short-tailed  and  the  other  assumptions  about  the  error 

distribution  hold.  If,  however,  the  error  distribution  has 
very  long  tails,  implying  that  extreme  observations  may  well 
occur,  least-sguar es  quickly  demonstrates  its  sensitivity  to 
large  random  error.  In  real  data,  the  analyst  very  rarely 
has  a hint  as  to  the  nature  of  the  true  distribution  of  the 
e^.  Heuristic  arguments  appealing  to  the  central  limit 

theorem  are  frequently  made  along  the  line  that  there  are 

several  sources  of  variability  in  the  data,  which,  in  the 
aggregate,  will  be  "normally"  distributed  and  thus  suitable 
for  least-squares.  Unfortunately,  if  any  of  the  errors  are 
long-tailed  (such  as  may  be  described  by  the  Cauchy 
distribution),  then  their  aggregate  effect  will  not  be 
normal. 
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Figure  10  and  Figure  11  in  the  Appendix  are  histograms 

of  least-squares  estimates  for  b and  b in  the  linear  model 

o t 

y = 22  + 2 (x  -x)  . 
i i 

The  e for  these  estimates  came  from  a Normal,  or  Gaussian, 

distribution,  for  which  the  least-squares  procedure  is 

optimal.  Figure  12  and  figure  13  show  the  effects  of 
Cauchy-distributed  error  on  the  estimates  of  these 
coefficients.  The  end  cells  contain  points  which  would 
otherwise  be  off  the  scale  of  the  histogram,  and  emphasize 
that  large  errors  in  estimating  the  coefficients  are  quite 
possible  when  using  least-squares.  A uniform  distribution's 
adverse  effect  upon  the  coefficient  estimates  is  shown  in 


Figure  14  and  Figure  15. 


A function  of  Cauchy  variates, 

C/100 

V = Ce 


is  the  error  density  associated  with  the  widely-varying 
coefficient  estimates  histogrammed  in  Figure  16  and  figure 
17.  This  distribution  is  virtually  symmetric  between  ±12, 
but  has  a long  tail  extending  toward  +oo  . Another 
distribution  of  error,  a function  of  normal  variates, 

N+0.01N2 

Z = Ne 

has  high  positive  skewness,  a little  bias,  and  an  adverse 

effect  upon  the  least-squares  estimates  for  b and  b as 

0 1 

shown  in  Figure  18  and  Figure  19. 

All  of  these  cases  demonstrate  that  the  variances  of  the 
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coefficient  estimates  may  be  drastically  increased  by  the 
presence  of  non-gaussian , and  especially  long-tailed, 
distributions  of  error,  while  the  bulk  of  the  estimates  do 
indeed  fall  near  the  actual  values,  there  is  clearly  an 
unacceptable  probability  of  obtaining  an  extreme  estimate 
when  using  the  least-squares  procedure. 


C.  DSE  OF  THE  CAUCHY  DISTRIBUTION 


Data  disturbed  by  Cauchy-distributed  error,  with  long, 
thick  tails  and  lack  of  finite  moments,  may  be  considered  an 
extremely  difficult  case  for  regression  t ichnigues  to  treat 
reliably.  A procedure  that  works  well  for  data  subjected  to 
such  extremely  straggly-tailed  errors  can  reasonably  be 
expected  to  work  well,  though  not  necessarily  optimally,  in 
many  curve-fiting  situations.  This  thesis  uses 
maximum- liklihood  estimates  for  regression  coefficients  to 
develop  a robust  regression  procedure,  then  further  assumes 
a Cauchy-distributed  error  to  apply  a specific  technigue  to 
a series  of  controlled  regression  problems. 
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II 


SINGLE-CARRIER  ROBOST  REGRESSION 


A.  MAXI  MOM- LIKE LI  HOOD  ESTIMATORS 

The  procedure  to  be  presented  is  based  upon  the  linear 
model 

y = b + b (x  -x)  + e 
i o 1 i i 

The  are  assumed  to  be  independent,  identically 

i 

distributed  random  variables  centered  at  zero  with  spread 
(a  scale  parameter)  and  having  a density  of  the  form 

The  probability  for 
expressed  as 

P< 


any  single  observation  y may  be 

i 


/y.-  b - s <*.-2)\  1 , 


The  likelihood  function  for  n observations  is  the  product  of 
n of  the  above  probilities.  Taking  logarithms,  the 
log-likelihood  function  is  then 


L<VV5  ) 


2 In 


b (x.-x) 
1 i 


£ 


n In  £ 


Partial  derivatives  are  taken  with  respect  to  b , t and 

0 1 

l , and  all  set  egual  to  zero  to  find  the  b ,b  and  £ 
which  maximize  L.  Using  r,  above,  </»(x)  is  defined  as 

~ - A in  f(x)  , 

the  three  equations  obtained  from  setting  the  partial 
derivatives  of  L to  zero  may  be  written  as 


= 0 


I 

I 
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This  system  of  non-linear  equations  may  be  solved 
iteratively.  Defining  w.  as 


i"  ’ 


I -j-1 


1 


where  the  superscript  (j)  refers  to  the  number  of 

th 

iterations.  The  equations  at  the  j iteration  are 


(j) 


5 (y . - b - b (x.-x))  = 0 

i i o i i 


2 w.(j)  (x  -f)  (y  - b - fi  (x.-x))  =0 

i l l 0 1 i 


1 „ (j) , (j-i) 


/2  = r n 2 w [r  ] 

Sj  Sj-1  i i 


The  first  two  equations  are  simply  weighted  least-squares 
normal  equations  which  may  be  solved  by  standard  iterative 
weighted  least-squares  algorithms  in  which  the  weights  for 
each  subsequent  iteration  are  calculated  from  the  above 


expression  for 


Assuming  the  error  to  be  Cauchy-distributed,  the  weighting 
formula  becomes 
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B.  INITIAL  ESTIMATES 


J 


It  is  necessary  to  begin  the  iterative  process  .with  an 
initial  estimate,  or  guess,  of  the  values  of  the 
coefficients.  A robust  estimate  suggested  by  D.  P.  Andrews 
[1]  using  the  median  provides  an  estimate  which  is 
insensitive  to  arbitrarily  large  disturbances  in  up  to  25* 
of  the  data 

The  first  coefficient,  b (which  corresponds  to  the  mean 

0 

of  the  y in  least-sguares  estimation)  is  estimated  by  the 
i 

median  of  the  y : 
i 


I 

A 


Next,  the  carriers,  [x  -x)  , are  ordered  and  then  broken  up 

into  three  groups  of  approximately  equal  size.  Of  interest 

are  the  upper  group  of  carriers,  x , the  lower  carriers,  x , 

0 L 

and  the  y corresponding  to  the  (x  -x)  in  each  group  (y  and 
i i u 


The  estimate  for  b^  is  a rough  slope  computed  from  the 
medians  of  the  four  groups: 
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The  median  of  the  absolute  values  of  the  residuals  from 
these  estimates  is  the  initial  guess  for  £ : 

I = median  |y  - b -b  (x  -x)  | 

>>o  i o 1 i 

o 

(1)  » 

Weights  w are  calculated  from  the  residuals  and  £ . The 

j 0 

algorithm  then  proceeds  until  the  values  of  the  coefficients 
stabilize . 


C.  SUMMARY  OF  PROCEDURE 


1 • Overall  Effect 

Figure  1 is  a typical  scatterplot  of  data  which 
includes  extreme  observations,  or  "outliers",  and  sketches 
of  representative  least-squares  and  robust  fits.  The  effect 
of  the  weighting  procedure  is  to  pull  the  extreme 
observations  in  closer  to  the  bulk  of  the  data,  reducing 
their  tendency  to  distort  the  fit  (note  least-squares  line) . 
It  should  be  noted  that  both  the  response  variable  and  the 
carriers  are  weighted  in  this  technique. 


t 

I 

2.  Solution  Not  Unique 

There  are  cases  in  which  the  robust  procedure  nay 
not  converge  to  a single  global  solution.  Since  the 
solution  to  the  weighted  normal  equations  is  actually  the 
solution  to  the  three  non-linear  equations  obtained  by 
setting  the  partial  derivatives  of  equal  to  zero,  there 
exists  the  possibility  of  converging  to  a local  solution  not 
optimizing  and  b^.  Figure  2 is  an  example  of  a local 

solution.  The  scatterplot  represents  data  which  actually 

has  two  seperate  means  (the  data  might  be  drive-in  movie 
attendence,  where  observations  were  made  only  on  Wednesdays 
and  Saturdays) . A least-squares  fit  approximately  splits 
' the  two  groups  of  points  as  indicated.  A robust  fit  may 

also  split  the  data,  but  could  converge  to  one  of  the  two 
clusters  if  either  is  sparse  enough  to  cause  the  weighting 
process  to  treat  its  points  as  outliers. 


i 
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3 . Algorithm 


The  following  flow  chart  depicts  the  algorithm  for 
the  Cauchy- weighting  regression  method.  The  criteria  for 
convergence  (change  in  both  coefficients  of  less  than  0.0 IX 
from  one  iteration  to  the  next)  was  somewhat  arbitrary,  but 
was  set  to  meet  practical  expectations  inanalysis  problems 
and  not  consume  excessive  amounts  of  computer  time. 


Figure  3 - ALGORITHM  FLOWCHART 
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D.  INADEQUACY  OP  R2 


One  of  the  measures  of  adequacy  of  fit  for  least-squares 
regression  is  R2,  the  amount  of  variance  explained  by  the 
regression.  It  is  the  ratio 

2 (y . - 7 ) 2 

i 

R2  = 

2 (y.  - y)2 

i 

where 

y.  * b ♦ b (x.-x)  . 

1 o 1 i 

For  least-squares,  R2  is  a fraction  between  0 and  1,  but  for 
a robust  procedure,  the  above  ratio  aay  exceed  1.  This 
occurs  when  the  robust  fit  is  "farther”  from  the  mean  of  the 
data  than  the  least-squares  fit. 

Consider  the  following  set  of  observations. 

y 3.75  6.00  7.00  8.00  10.25 

X 1.00  2.00  4.00  6.00  7.00 

The  mean  of  the  y,  is  7.00  and  a least-squares  fit  of  the 

aodel  y = b ♦ b x to  the  data  yields  b = 3.385, 

•*o»  o 

b ■ 0.094  and  R2  * 0.919.  A robust  fit  would  reduce  the 
> 

effects  of  observations  (2.00,6.00)  and  (6.00,8.00)  since 
they  lie  soaewhat  off  the  line  through  the  other  three 
points.  A robust  procedure,  bringing  these  "extreme" 
observations  in  closer  to  the  rest  of  the  data,  night  well 
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yield  coefficients  of  b 5 3.000  and  b = 1.000.  From  these 

o 1 

coefficients  and  the  data,  R2  = 1.124.  Figure  4 is  a 
scatterplot  of  the  observations  with  drawings  of  the  actual 
least-squares  fit  and  the  postulated  robust  fit.  Note  that 
the  two  fits  are  very  close,  but  more  importantly,  that  the 

robust  fit  is  rotated  so  that  (y  - y)  2 for  the  robust  fit 

i 

is  everywhere  greater  than  or  equal  to  the  same  measure  for 
the  least-squares  fit. 
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More  generally,  as  in  Figure  1,  R2  as  calculated  above 
is  small  due  to  the  large  deviations  from  the  mean  caused  by 
outliers.  When  a response  variable  has  only  a single 
carrier,  a plot  of  the  data  and  the  fitted  line  provide  a 
visual  evaluation  of  the  fit.  In  multivariable  cases,  it  is 
usually  impossible  to  plot  the  data  meaningfully,  and  the 
good  fit  of  the  robust  line  to  the  bulk  of  the  data  could  be 
belied  by  inappropriately  using  R2  as  a measure  of  the  fit. 


t 
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III.  EXPERIMENTAL  PROCEDURE  FOR  A SINGLE  CARRIER 


A.  GENERAL  DESCRIPTION 


A "true"  model 

y = 22  + 2(x  -x)  * e 
i i i 

was  established  to  enable  comparisons  of  the  Cauchy 

weighting  technique  and  least-squares.  The  x were  the 

i 

integers  1 through  20,  and  random  variates  were  selected 

from  one  of  five  controlled  error  distributions  to  produce 

20  observations  of  the  y . The  y were  then  regressed  on 

i i 

the  (x  -x)  to  obtain  estimates  for  b and  b which  could  be 
i 0 1 

compared  to  the  actual  values. 


One  thousand  replications  were  made  for  each 

distribution  and  each  method.  Histograms  were  constructed 

for  both  b and  b to  reveal  their  distributions, 

o t 

Preliminary  runs  indicated  that  most  problems  converged 

(both  coefficients  changed  less  than  0.01*  from  one 
iteration  to  the  next)  within  10  iterations.  To  reduce  the 
amount  of  tine  to  perform  the  experiment,  the 
Cauchy-weighting  iterations  were  terminated  no  later  than 
the  seventh  iteration.  Values  of  the  coefficients  were 
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recorded  at  the  fourth  iteration  to  see  if  there  were 
significant  changes  between  that  and  the  final  iteration. 
If  the  problem  converged  early,  data  normally  collected  at  a 
later  point  were  assigned  the  stabilized  values. 


B.  EBHOB  DISTRIBUTIONS 


Five  controlled  error  distributions  were  used  to  disturb 
the  observations.  The  first,  the  Gaussian  or  '’Normal" 

distribution  with  mean  zero  was  matched  to  the  second,  a 
Cauchy  distribution.  This  was  done  by  integrating  the 

th 

Cuachy  density  centered  at  zero  to  find  the  75  quantile, 

giving  1 as  a measure  of  the  spread  of  the  distribution. 

th 

Since  the  corresponding  Normal  (0,1)  75  quantile  is  0.6745, 

a Normal  distribution  with  standard  deviation  1.4826  will 
have  the  same  interquartile  range  as  a Cauchy  distribution 
with  spread  parameter  1.  The  third  source  of  error  was  a 
uniform  distribution  with  mean  zero  and  variance  matched  to 
the  above  Normal,  giving  it  a range  of  -13.1886  to  13.1886. 


The  "V"  density  is  a function  of  Cauchy  variates  C: 


V 


C/100 

Ce 


It  is  positively  skewed,  but  virtually  symmetric  between  -18 
and  +18  with  a very  pronounced  central  spike.  Figure  5 is  a 
histogram  of  2000  "V"  variates. 


26 


« 


The  final  test  density  is  a function  of  Normal  variates 
N with  mean  zero  and  variance  1. 

N+0.01N2 

Z = Ne 

It  is  positively  shewed  and  slightly  biased.  A histogram  of 
2000  "Z"  variates  is  shown  in  Figure  6. 
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Figure  5 - "V"  HISTOGRAM 


-3 


Figure  6 - "Z"  HISTOGRAM 


IV.  RESULT  £ OF  SINGLE-CARRIER  EXPERIMENT 


A.  LEAST-SQUARES  ADVANTAGES 


Summary  statistics  for  the  distributions  of  b and  b 

01 

for  the  single-carrier  experiment  are  shown  in  Figures  7 and 

8.  Looking  at  means  and  standard  deviations,  least-squares 
estimates  (maximum-liklihood  estimates  for  normal-error 
data)  are  better  for  normally-distributed  cases  than  the 
Cauchy  method.  Interestingly,  least-squares  is  also 
noticeably  better  when  the  error  comes  from  a uniform 
distribution.  This  result  could  be  explained  by  the 
relatively  broad  area  in  which  the  data  points  may  fall  for 

the  uniform  error  with  respect  to  the  range  of  the  (x  -x) 

i 

used,  and  the  susceptibility  of  the  Cauchy  method  to 
convergence  to  local  solutions. 
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Figure  7 - SUMMARY  OF  SINGLE-CARRIER  b DISTRIBUTION 
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True  value  = 2 
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Pigure  8 - SUBBARY  OP  SINGLE-CARRIER  DISTRIBUTION 
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Pigure  9 is  a diagram  of  the  region  in  which  data 
points  may  fall  when  the  error  is  uniform  between 
±13.1886  . Since  the  observations  may  lie  anywhere  in  the 
region  with  equal  probability,  the  weighting  process  may  not 
be  able  to  clearly  discriminate  which  points  are  outliers. 
Chance  alignments  of  a series  of  points  could  determine  a 
local  optimum  upon  which  the  Cauchy-likelihood  method  would 
converge.  While  other  distributions  have  longer  tails,  the 
bulk  of  their  variates  fall  within  a relatively  small 
distance  of  their  center,  better  defining  a mean  trend  and 
clearly  differentiating  outliers  from  the  rest  of  the 
observat ions. 
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B.  CAUCHY  METHOD  ADVANTAGES 


Comparing  means  and  standard  deviations  for  the  two 
methods,  the  Cauchy- likelihood  procedure  is  clearly  the  more 
reliable  technique  when  the  errors  have  long  tails.  Maximum 
and  minimum  estimates  of  the  coefficients  are  closer  to 
their  true  values  when  the  Cauchy  technique  is  used,  and  it 
never  produces  extreme  estimates.  There  is  little 
difference  in  the  estimates  from  the  fourth  to  the  seventh 
iterations. 


C.  SIMILARITIES 


The  means  and  medians  for  both  methods  are  not 
significantly  different.  The  coefficient  estimates  between 
th  th 

the  25  and  75  quantiles  are  virtually  the  same  over  all 

distributions,  the  Cauchy-based  method  doing  better  for  the 
long-tailed  distributions  and  the  normal  having  some 
advantage  principally  when  the  error  is  uniform.  Even 
th  th 

between  the  10  and  90  quantiles,  the  two  procedures 
yield  very  similar  results. 
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V . REGRESSION  ON  TWO  INDEPENDENT  CARRIERS 


A.  MULTIVARIATE  MODEL 


Por  two  independent  carriers,  the  linear  model  is 
assumed  to  be  of  the  form 


y. 

i 


hn  + b,  (*-*,)  + b,<x . "*J  + €. 

0 1 ll  1 2 12  2 i 


The  regression  can  be  expressed  in  matrix  terms  as 


T = IB  ♦ E. 


T is  an  nxl  matrix  of  n response  variable  observations,  X is 
an  nxp  matrix  having  all  I's  in  the  first  column,  the  n 
observations  of  the  first  carrier  in  the  second  column,  and 
the  n observations  of  each  of  the  remaining  p-2  carriers  in 
each  of  the  remaining  columns.  B is  a pxl  matrix  of 
coefficients,  b ,b  ,b  ,...b  , and  E is  an  nxn  matrix  of 

0 1 2 p-1 

unknown  random  errors,  independent  and  identically 
distributed,  centered  at  zero  and  having  constant  spread. 


Using  a prime  (')  to  designate  the  transpose  of  a 
matrix,  the  least-squares  normal  equations  are 


The  weighted  normal  equations  are 


(SX)  ' I = (MX)  »IB 


where  I is  an  nxn  matrix  having  as  its  diagonal  elements  w 

ii 

th  (k) 

the  k iteration  weights,  w , and  zeros  elsewhere.  The 

i 


weighted  normal  equations  are  equivalent  to  a system  of  p 


equations  in  p unknowns  (the  b , i = 0,1,..., p— 1 ) which  can 

i 


be  written  as 


0 = VB 

and  is  easily  solved  for  B. 

B.  MODIFICATION  TO  INITIAL  ESTIMATION  PRODEDURE 

Multiple-carrier  regression  problems  require  a 

modification  to  the  initial  estimate  procedure  to  ensure 
that  any  interdependence  among  the  carriers  is  removed  prior 
to  estimating  the  effects  of  the  carriers  on  the  response 
variable.  D.  F.  Andrews  [1]  has  suggested  a rather 

time-consumming  method  applying  a robust  sweep  operator  to 
the  columns  of  the  X matrix  in  an  iterative  process.  An 

alternate  method  inspired  by  Hosteller  and  Tukey  [6] 

sequentially  regresses  the  carriers  on  each  of  their 
predecessors  in  the  X-matrix  to  eliminate  unwanted  effects. 
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Multiple  regression  may  be  viewed  as  a sequence  of 
single-carrier  regressions  in  which  the  dependent  variable, 
y,  is  regressed  on  the  first  carrier  alone  according  to  the 
model 

y = y x + residual 
1 1 

Let  y ("y  adjusted  for  x ) be  the  residual  after  the 
; 1 1 

effects  of  x are  removed: 

1 

y = y - V x 
; 1 11 

This  residual  is  set  aside  while  the  effect  of  x on  x is 

1 2 

removed  using  the  model 


x = d x + x 
2 2;  1 1 2;  1 


x being  "x  adjusted  for  x " . The  residual  of  y 
2;1  2 1 

( y ) is  then  regressed  on  x to  find  b in  the  model 

; 1 2; 1 2 


y = b x + residual 
?1  2 2 ; 1 


Substituting  for  y and  x , 

J 1 2;1 

that 


b = ( y - d b ) so 
i 2;i  z 


y=bx  +bx  ♦ residual 

II  2 2 


Por  a model  having  a mean  effect 


y * b *■  b (x  -x  ) ♦ b (x  -x  ) ♦ residual 

i o » il  1 * i2  2 
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In  practice,  b is  found  immediately,  (from  the  median  of 
o 

the  y , as  before)  and  its  effects  not  removed  for 
i 

computational  considerations.  It  is  not  important  to  remove 

the  mean  effect  since  it  is  independent  of  the  carrier 

effects  that  must  be  removed. 

/ 


Estimates  for  y are  found  using  the  median  estimate 

i 

described  above; 


i • 


A 

V 

y 

a u 

a7! 

r . 
1 

i 

X 

a ui 

• 

X 

a li 

where  the 

a 

subscript 

indicates 

the 

quantities  have  been 

adjusted 

for 

all  preceeding 

carriers. 

For  example,  the 

estimat  e 

for 

A 

V would 

3 

require 

that 

y. 

i 

and  x both  be 

i3 

adjusted  for  the  effects  of  carrier  x and  carrier  x . A 

1 2 ; 1 


similar  procedure  finds  the  d coefficients  for  the  j 

j;a 

A A 

carrier  regressed  cn  its  predecessors.  The  V and  d may 

i j ;a 

then  be  arranged  in  a s 'stem  of  equations  in  the  b and 

i 

subsequently  solved  to  yield  the  coefficients  for  the 
desired  model. 
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C.  TWO-CARRIER  EXPERIMENT 


The  twc-carrier  experiment 

one-carrier  tests.  Coefficients  b , 

o 


was  analagous  to  the 

b and  b were  fixed  to 
1 2 


establish  a known  model 


y = 13  + 3(x  -x  ) - 0.5(x  -x  ) 
i i 1 1 i2  2 

The  x were  the  integers  1 through  20  in  ascending  order; 
il 

the  x were  the  same  integers  shuffled  to  establish 
i2 

independence  in  the  X-matrix  columns.  The  "true"  y were 

i 

then  calculated  and  subsequently  disturbed  by  the  same 


additive  error  e as  in  the  one-carrier  case. 

i 


Since  the  single-carrier  experiment  showed  little  change 
in  the  values  of  the  coefficients  from  the  fourth  iteration 
to  the  seventh,  the  two-carrier  iterations  were  terminated 
after  four  cycles  (or  convergence)  for  each  of  1000 
replications  for  each  of  the  five  distributions.  Only  final 
values  were  recorded  since  the  initial  guesses  tended  to  be 
somewhat  unstable  in  the  first  experiment. 


D.  RESULTS  OP  THE  TWO-CARRIER  EXPERIMENT 


The  results  of  the  second  experiment  are  summarized  in 
Figures  20,  21  and  22  in  the  Appendix.  The  estimates  of  the 
coefficients  parallel  the  single-carrier  cases  exactly,  with 
the  exception  of  the  Cauchy  method  applied  to  uniform 
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disturbances.  The  standard  deviations  of  the  Cauchy 
estimates  for  the  coefficients  of  the  uniform-disturbed  data 
are  slightly  lower  than  in  the  single-carrier  case,  in 
contrast  to  the  general  trend  for  the  standard  deviations  to 
be  higher  for  the  two-carrier  problems.  While  there  seems 
to  be  some  interaction  between  the  carriers  which  raises  the 
standard  deviations  in  general,  the  use  of  two  carriers  may 
be  reducing  the  tendency  of  the  Cauchy  method  to  converge  to 
a local  solution. 
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VI.  CONCLUSIONS 


The  robust  method  developed  and  tested  in  this  paper 
demonstrates  extremely  stable  behavior  over  a variety  of 
distributions  of  random  error.  Traditional  least-sguares 
estimation,  on  the  other  hand,  is  subject  to  potentially 
large  errors  in  its  estimates  of  regression  coefficients. 
The  method  based  on  Cauchy- likelihood  weighting  has 
consistently  smaller  error  when  outliers  are  present  and 
only  slightly  larger  errors  (though  never  any  extreme 
errors)  when  the  error  distribution  is  closer  to  the  Normal 
distribution . 

The  Cauchy-likelihood  estimates  appear  to  be  very 

slightly  biased.  The  centers  of  the  y tend  to  be  estimated 

i 

too  high,  while  the  slopes  of  the  carriers  are  consistently 
low.  Possibly,  if  the  experiment  were  run  again  with  the 
signs  of  the  error  terms  reversed,  the  apparent  biasing 
would  also  reverse  to  imply  that  the  procedure  is  robustly 
unbiased . 


There  are  two  drawbacks  to  the  Cauchy  method.  The  first 
is  its  requirement  for  more  calculations  and  intermediate 
storage.  The  initial  estimates  c f the  coefficients  alone 
require  more  computer  assets  than  least-sguares  needs  for  a 
complete,  though  possibly  erroneous,  solution,  ks  a general 


rule,  the  robust  Cauchy-likelihood  procedure  requires  twice 
the  data  storage  capacity  and  five  to  six  times  as  long  to 
run  as  a basic  least-squares  routine.  Clearly,  the  large 
reduction  in  risk  for  obtaining  seriously  inaccurate 


estimates  of  regression  coefficients  warrants  the  use  of  the 
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Cauchy,  or  seme  other  robust  procedure  in  every-day  data 
analysis,  even  with  the  increase  in  computer  requirements. 

The  other  problem  with  using  the  Cauchy-likelihood 
method  is  the  possible  convergence  upon  a local  solution. 
It  should  be  noted,  however,  that  traditional  least-squares 
will  also  produce  erroneous  results  when  used  under  the  same 
conditions  which  would  cause  the  Cauchy-based  technique  to 
stabilize  at  a local  solution. 
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APPENDIX  A 

HISTOGRAMS  OF  LEAST-SQUAR ES  ESTIMATES 
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Figure  11  - LEAST-SQUARES  ESTIMATE  OF  b KITH  NORMAL  ERROR 


tt  Figure'  12  - LEAST-SQUARES  ESTIMATE  OF  b WITH  CAUCHY  ERROR 
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Figure  13  - LEAST-SQUARES  ESTIMATE  OF  h WITH  CAUCHY  ERROR 
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Figure  14  - LEAST-SQUARES  ESTIMATE  OF  b WITH  UNIFORM  ERROR 
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Figure  15  - LEAST-SQUARES  ESTIMATE  OF  b WITH  UNIFORM  ERROR 
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Figure  17  - LEAST-SQUARES  ESTIMATE  OF  WITH  "V"  ERROR 
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Figure  18  - LEAST-SQUARES  ESTIMATE  OF  b WITH  "Z"  ERROR 
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True  value  =13 


I 

l 


Normal 

Cauchy 

Uniform 

iiytt 

"Z" 

• 

Mean 

Cauchy 

Least-squares 

13.  02 
13.00 

13.01 

298 

13.03 

12.99 

13.01 

1.4E9 

13.  18 
14.3 

Std.  Dev. 
Cauchy 

Least-squares 

. 417 
. 331 

.406 

9240 

2.71 

1.69 

.405 

1.8E10 

. 358 
2.78 

Minimum 

Cauchy 

Least-squares 

11.66 

12.02 

11.52 

-4063 

4.68 

8.24 

11.49 

8.83 

12.  69 
12.86 

/ 

. 10  Quantile 
Cauchy 

Least-squares 

12.  51 
12.59 

12.54 

9.85 

9.54 

10.87 

12.55 

11.57 

12.83 

13.47 

.25  Quantile 
Cauchy 

Least-squares 

12.74 

12.77 

12.76 
11  .88 

11.21 

11.78 

12.75 

12.34 

12.  93 
1 3.86 

.50  Quantile 
Cauchy 

Least-squares 

13.02 

13.00 

13.00 

13.02 

13.02 

12.97 

12.99 

13.25 

13.08 

14.40 

.75  Quantile 
Cauchy 

Least-squares 

13.31 

13.23 

13.27 

14.04 

14.87 
14.  18 

13.26 

14.58 

13.32 
15.  16 

« 

.90  Quantile 
Cauchy 

Least-squares 

13.56 

13.42 

13.52 
16 .15 

16.58 

15.21 

13.51 

18.54 

13.64 
16.  18 

• 

Maximum 

Cauchy 

Least-squares 

14.35 

14.04 

15.01 

29.22 

20.  18 
18.30 

14.63 

2.3E11 

15.35 

89.14 

Pigure  20  - SUMMARY  OP  TWO-CARRIER  b DISTRIBUTION 
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True  value  = 3 


► 


Normal 

Cauchy 

Uniform 

Hy  11 

II 2 •• 

Mean 

Cauchy 

Least-squares 

2.99 

3.00 

2. 99 
16.28 

2.96 

2.99 

2.99 

1.2E7 

3 .00 
2.99 

Std.  Dev. 
Cauchy 

Least-squares 

.071 
. 057 

.071 

354 

. 446 
.287 

.072 

2.7E9 

. 044 
. 271 

Minimum 

Cauchy 

Least-squares 

2.73 

2.79 

2.69 

-65.17 

1.67 

2.00 

2.64 

-7E10 

2.77 
. 144 

.10  Quantile 
Cauchy 

Least-sq  uares 

2.89 

2.93 

2.90 

2.47 

2.4  1 
2.63 

2.90 

2.52 

2.95 
2 .77 

.25  Quantile 
Cauchy 

Least-squares 

2.94 

2.96 

2.95 

2.84 

2.66 

2.80 

2.95 

2.86 

2.98 

2.91 

.50  Quantile 
Cauchy 

Least-squares 

2.99 

3.00 

2.99 

3.00 

2.96 

2.99 

2.99 

3.00 

3.00 
3 .00 

.75  Quantile 
Caucny 

Least-squares 

3.03 

3.04 

3.03 

3.15 

3.25 

3.18 

3.03 

3.12 

3.02 

3.08 

.90  Quantile 
Cauchy 

Least-squares 

3.08 

3.07 

3.08 
3. 37 

3.54 

3.37 

3.07 

3.37 

3.04 

3.20 

Maximum 

Cauchy 

Least-squares 

3.24 

3.18 

3.35 
11  140 

4.19 

3.85 

3.36 

3.97 

3.35 

6.29 

Figure  21  - SUMMARY  OF  TWO-CARRIER  DISTRIBUTION 
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True  value  = -0.5 


Normal 

Ca uchy 

Unif  orm 

it  7 n 

"Z  " 

Mean 

Cauchy 

Least-squares 

-.501 

-.502 

-.501 

49.29 

-.509 

-.503 

-.501 
-1. 3E8 

- .501 
-.502 

Std.  Dev. 
Cauchy 

Least-squares 

.070 

.055 

.069 

1576 

.435 

.283 

.069 

3.3P9 

. 04  4 
.44  3 

Minimus 

Cauchy 

Least-squares 

- .767 
-.665 

-.873 

-388 

-1.76 

-1.29 

-.788 

-5E10 

-.802 

-3.77 

.10  Quantile 
Cauchy 

Least-squares 

-.591 

-.575 

-.585 
- . 991 

-1.08 

-.866 

-.587 

-.897 

-.547 
- .721 

.25  Quantile 
Cauchy 

Least-squares 

-.550 

-.536 

-.542 

-.663 

-.824 

-.682 

-.543 

-.650 

- .518 
-.589 

.50  Quantile 
Cauchy 

Least-squares 

- . «99 
-.501 

-.499 

-.503 

-.499 
-.4  99 

-.499 

-.502 

-.501 
-.4  96 

.75  Quantile 
Cauchy 

Least-squares 

-.455 

-.468 

- .460 
-.368 

-.214 

-.317 

-.458 

-.367 

-.4  82 
-.414 

.90  Quantile 
Cauchy 

Least-squares 

-.415 

-.429 

-.418 

-.109 

-.062 

-.129 

-.418 

-.126 

-.457 
- .295 

Maximum 

Cauchy 

Least-squares 

-.295 

-.311 

-.212 

49823 

.619 

.365 

-.234 

3.9E10 

-.294 
1 .07 

» 


Figure  22  - 


SUMMARY  OF  TWO-CARRIER  b 
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